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DETAILED ACTION 

This action is in response to the amendment filled May 12, 2008. Claims 1-17 are 
pending, with claims 8 and 17 canceled, claims 1 and 9 amended, and claims 2-7 and 
10-16 original. 

Response to Arguments 

Applicant's arguments filed May 12, 2008 have been fully considered but they are 
not persuasive. 

Applicant argues that, "the language model generally incorporates the set of 
words and sentences, which is not related to speech rules, so the language model 
should not be corresponded to the speech rule database" (Remarks page 6); however 
the examiner respectfully disagrees. During speech recognition, a spoken word is 
matched to an acoustic model to determine the best acoustic word match. The chosen 
words are then compared to a language model to determine the best word based on 
the context of the speech and language and grammar rules. The language model is 
generally derived from an analysis of the language, including parts-of-speech, syntax 
and semantics. Therefore the language model naturally incorporates the speech rules 
associated with that language. 

Applicant additionally argues that, "Although D'Hoore has disclosed most 
technical features of the present invention, the locating and comparison of the candidate 
data sets further referring the connecting sequence of the speech features and the 
speech database is not disclosed" (Remarks page 6, and repeated on page 7); 
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However the examiner respectfully disagrees. D'Hoore discloses processing an input 
speech signal into spectral features, then comparing those features to an acoustic 
model and then to a language model (candidate data sets) (column 4 lines 42-45) - the 
acoustic model consisting of biphone or triphone units based on Hidden Markov Models 
(column 3 lines 24-25). Biphone (or diphone) and triphone HMM's are used to model a 
connecting sequences of sounds, for example one phoneme followed by either its 
previous (diphone) or two previous (triphone) phonemes. In addition, as noted above, 
the language model is derived from an analysis of the language, and therefore naturally 
incorporates speech rules. Therefore the D'Hoore discloses "locating and comparing a 
plurality of candidate data sets corresponding to the speech features, referring the 
connecting sequences of the speech featres (HMM biphone or triphone models) and a 
speech rule database (language model)" as recited in claim 1 . 

Applicant argues that, "the present invention and D'Hoore are implemented by 
different models, respectively, and, further, Applicant believes that the technical features 
of clam 2 are distinguishable over D'Hoore. Thus, the limitations of the present invention 
are not disclosed in the D'Hoore citation" (Remarks page 7); however the examiner 
respectfully disagrees. Biphone, as used in D'Hoore, and diphone, as used in the 
instant application, are terms used synonimously to refer to a two phoneme model. 
Since these two terms mean essentially the same thing, the invention and D'Hoore are 
not implemented by different models, thus the technical features of claim 2 are not 
distinguishable over the prior art. 
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Applicant argues that, "The multi-lingual context-speech mapping data has no 
relationship with the context dependent acoustic models and generation steps thereof 
are different that the context dependent acoustic models, which is generated by a multi- 
lingual baseform generation engine and a cross-lingual diphone model generation 
engine" (Remakrs page 7 and 8) however this argument fails to comply with 37 
CFR 1 .1 1 1 (b) because it amounts to a general allegation that the claims define a 
patentable invention without specifically pointing out how the language of the claims 
patentably distinguishes them from the references. Multi-lingual context dependent 
acoustic models of D'Hoore are used to map to an input speech sequence. Applicant 
has not provided sufficient evidence pointing out how the multi-lingual context 
dependent acoustic models differ form the multi-lingual context-speech mapping data, 
as disclosed in claims 4 and 12. 

Applicant argues that, "Although D'Hoore discloses single language and multi- 
language acoustic models, training context independent models using Viterbi training of 
discrete density HMM's, merging the context dependent and context independent 
phoneme models, and smoothing the trained context dependent models with the 
context independent models, but those steps are not related to that of the present 
invention, comprising fixing left contexts and mapping right contexts to obtain a mapping 
result, fixing right context and mapping the left contexts to obtain the mapping result, 
and obtaining the multi-lingual context-speech mapping data according to the mapping 
result. Thus, the limitations of the present invention are not disclosed in D'Hoore citation 
and claim 14 is novel based on the features of D'Hoore and should be allowable" 
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(Remarks page 8 and 9) as well as that, "Although D'Hoore and Waibel apply various 
models, the present invention applies the multi-lingual anti-models different from those 
model is D'Hoore and Waibel. Different models may result in different effects for speech 
recognition. Therefore, Applicant believes the assertion by the examiner that the models 
of the present invention have been disclosed in D'Hoore and Waibel is unreasonable" 
(Remarks page 10 and page 1 1 ); However, these arguments fail to comply with 37 
CFR 1 .1 1 1 (b) because they amounts to a general allegation that the claims define a 
patentable invention without specifically pointing out how the language of the claims 
patentably distinguishes them from the references. 

Applicant additionally argues that, "the uni-lingual anti-model generation engine 
receiving the multi-lingual query commands to generate the plurality of uni-lingual anti- 
models and the anti-model combination engine calculating the ui-lingual anti-models to 
generate the multi-lingual anti-models are also not disclosed in D'Hoore and Waibel" 
(Remarks page 1 1 ), however the examiner respectfully disagrees. D'Hoore discloses 
single language and multi-language models for speech recognition created during 
training (column 4 lines 63-67). In addition, Waibel discloses the use of garbage 
models, or anti-models, to model nonstationary human noise. Therefore it would have 
been obvious for one of ordinary skill to have an anti-model generation engine that 
receives a plurality of multi-lingual query commands to generate a plurality of uni-lingual 
anti-models corresponding to specific languages, since one of ordinary skill has good 
reason to pursue the options within his or her technical grasp in order to achieve the 
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predictable result of improving speech recognition by removing unwanted noise from the 
signal. 

Claim Rejections - 35 USC § 102 

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

Claims 1-4, 8-12, 14 and 17 are rejected under 35 U.S.C. 102(b) as being 
anticipated by D'hoore (6,085,160). 

1 . As per claims 1 and 9, D'hoore discloses a system for multi-lingual speech 
recognition, comprising: 

a speech modeling engine, receiving and transferring a mixed multi-lingual 
speech signal into a plurality of speech features (column 2 lines 7-13 and column 3 lines 
34-40); 

a speech search engine, coupled to the speech modeling engine, receiving the 
speech features, and locating and comparing a plurality of candidate data sets 
corresponding to the speech features, referring to connecting sequences of the speech 
features and a speech rule database, to find match probability of a plurality of candidate 
speech models of the candidate data sets (column 4 lines 42-45, feature vectors are 
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compared to acoustic models (connecting sequences) and then to a language model 
(speech rule database) to determine the best match); and a 

decision reaction engine, coupled to the speech search engine, selecting a 
plurality of resulting speech models corresponding to the speech features according to 
the match probability from the candidate speech models to generates a speech 
command (column 4 lines 42-45, feature vectors are compared to acoustic models and 
then to a language model to determine the best match). 

2. As per claims 2 and 1 0, D'hoore discloses the system as claimed in claims 1 
and 9, wherein the speech models are characterized by diphone models (column 3 lines 
24-25). 

3. As per claims 3 and 1 1 , D'hoore discloses the system as claimed in claims 1 
and 9, wherein the speech searching engine locates and compares the candidate data 
sets by referring a multi-lingual model database (column 3 lines 4-15 and column 4 lines 
42-45, feature vectors are compared to acoustic models to determine the best match, 
where the database of acoustic models contains speech from several languages). 

4. As per claims 4 and 1 2, D'hoore discloses the system as claimed in claims 3 
and 11, wherein the multi-lingual model database comprises multi-lingual context- 
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speech mapping data column 4 line 63 - column 5 line 14, context dependent acoustic 
models are trained and used for recognition). 



6. As per claim 14, D'hoore discloses the method as claimed in claim 13, wherein 
selection and combination further comprises the steps of: fixing left contexts of the 
multi-lingual baseforms and mapping right contexts of the multi-lingual baseforms to 
obtain a mapping result; fixing right context and mapping the left contexts of the multi- 
lingual baseforms to obtain the mapping result if the right contexts of the multi-lingual 
baseforms mapping fails; and obtaining the multi-lingual context-speech mapping data 
according to the mapping result (column 4 line 63 - column 5 line 14, context 
dependent biphone acoustic models are trained and used for recognition. Since the 
acoustic models used are biphone models, it is inherent that left and right context and 
mapping is used). 

Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 
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Claims 5 and 13 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
D'hoore in view of Burns (5,454,106). 

7. D'hoore discloses the system as claimed in claim 4, further comprising: 

a multi-lingual baseform mapping engine, comparing a plurality of multi-lingual 
inputs to obtain a plurality of multi-lingual baseforms (column 3 lines 9-18, the system 
recognizes phonemes or phoneme like units, therefore it is inherent that the system first 
performs token ization, or obtains baseforms); and 

a cross-lingual diphone model generation engine, coupled to the multi-lingual 
baseform mapping engine, selecting and combining the multi-lingual baseforms to 
generate the multi-lingual context-speech mapping data (column 3 lines 22-25 and 
column 4 line 63 - column 5 line 14, context dependent biphone acoustic models are 
trained and used for recognition). 

However, D'hoore does not disclose comparing a plurality of multi-lingual query 
commands to obtain a plurality of multi-lingual baseforms. Burns discloses inputting 
query commands to a speech recognizer, which are then sent to be scanned by a 
tokenizer (column 4 lines 20-29). Burns discloses a system that enables a user to 
retrieve information from a database using natural language queries (column 3 lines 10- 
15). 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to compare a plurality of multi-lingual query commands to obtain a 
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plurality of multi-lingual baseforms in D'hoore, since one or ordinary skill in the art has 
good reason to pursue the options within his or her technical grasp in order to achieve 
the predictable result of producing a multi-lingual speech recognition system optimized 
for a variety of recognition tasks. 

Claims 6,7,15 and 16 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over D'hoore in view of Waibel ("Interactive Translation of Conversational Speech" 
IEEE 1996). 

8. As per claims 6 and 15, D'hoore discloses the system as claimed in claims 3 
and 1 1 , however D'hoore does not disclose wherein the multi-lingual model database 
comprises a plurality of multi-lingual anti-models. Waibel discloses a system for speech 
recognition which uses garbage models to model nonstationary noises (page 44, 
second paragraph). These garbage models, also known as anti-models, are used to 
model common nonspeech noises, such as coughs and lip-smacking, and non human 
noises, such as a door slams and ringing telephones. 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to use anti-models in D'hoore, since one of ordinary skill in the art has 
good reason to pursue the options within his of her technical grasp in order to achieve 
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the predictable result of removing non-speech and background noises, thus improving 
the overall recognition accuracy. 

9. As per claims 7 and 1 6, D'hoore in view of Waibel discloses the system as 
claimed in claims 6 and 15, but D'hoore does not disclose at least one uni-lingual anti- 
model generation engine, receiving a plurality of multi-lingual query commands to 
generate a plurality of uni-lingual anti-models corresponding to specific languages; and 
an anti-model combination engine, coupled to the uni-lingual anti-model generation 
engine, calculating the uni-lingual anti-models to generate the multi-lingual anti-models. 
However, D'hoore does disclose receiving multi-lingual speech input and training multi- 
lingual acoustic models (column 4 line 63- column 5 line 14). In addition, Waibel 
discloses a system for speech recognition which uses garbage models to model non- 
stationary noises (page 44, second paragraph). These garbage models, also known as 
anti-models, are used to model common non-speech noises, such as coughs and lip- 
smacking, and non human noises, such as a door slams and ringing telephones. 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to receiving a plurality of multi-lingual query commands to generate a 
plurality of uni-lingual anti-models corresponding to specific languages, and use an anti- 
model combination engine, coupled to the uni-lingual anti-model generation engine, to 
calculate the uni-lingual anti-models to generate the multi-lingual anti-models in 
D'hoore, since one of ordinary skill in the art has good reason to pursue the options 
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within his of her technical grasp in order to achieve the predictable result of removing 
non-speech and background noises, thus improving the overall recognition accuracy. 

1 0. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. Refer to PTO-892, Notice of References Cited for a listing of 
analogous art. 

1 1 . THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1 .136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Greg Borsetti whose telephone number is 571-270- 
3885. The examiner can normally be reached on Mon-Thur 9:30am-5:30pm. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on 571-272-7602. The fax phone 
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number for the organization where this application or proceeding is assigned is 571- 
273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

8/5/2008 /Talivaldis Ivars Smits/ 

Primary Examiner, Art Unit 2626 

/G. A. B./ 

Examiner, Art Unit 2626 



